Gesture based interactive control of electronic equipment

ABSTRACT

A computer-implemented method for controlling one or more electronic devices by recognition of gestures made by a three-dimensional object (3D). In one example embodiment, the method comprises capturing a series of successive 3D images in real time, identifying that the object has a predetermined elongated shape, identifying that the object is oriented substantially towards a predetermined direction, determining at least one qualifying action being performed by a user and/or the object, comparing the at least one qualifying action to one or more pre-determined actions associated with the direction towards which the object is oriented, and, based on the comparison, selectively issuing to the one or more electronic devices a command associated with the at least one qualifying action.

RELATED APPLICATIONS

This application is Continuation-in-Part of Russian Patent ApplicationSerial No. 2011127116, filed Jul. 4, 2011, which is incorporated hereinby reference in its entirety for all purposes.

BACKGROUND

1. Technical Field

This disclosure relates generally to computer interfaces and, moreparticularly, to methods for controlling electronic equipment byrecognition of gestures made by an object.

2. Description of Related Art

The approaches described in this section could be pursued but are notnecessarily approaches that have previously been conceived or pursued.Therefore, unless otherwise indicated, it should not be assumed that anyof the approaches described in this section qualify as prior art, merelyby virtue of their inclusion in this section.

Interactive gesture interface systems are commonly used to interact withvarious electronic devices including gaming consoles, Television (TV)sets, computers, and so forth. The general principle of such systems isto detect human gestures or motions made by users, and generate commandsbased thereupon that cause electronic devices to perform certainactions. Gestures can originate from any bodily motion or state, butcommonly originate from the face or hand. However, some systems alsoinclude emotion recognition features.

The gesture interface systems may be based on various gesturerecognition approaches that involve the utilization of cameras, movingsensors, acceleration sensors, position sensors, electronic handheldcontrollers, and so forth. Whichever approach is used, human gesturescan be captured and recognized, and a particular action can be triggeredby an electronic device. Particular examples may include wirelesselectronic handheld controllers, which enable users to control gamingconsoles by detecting motions or gestures made by such controllers.While such systems became very popular, they are still quite complex andrequire the utilization of various handheld controllers that aretypically different for different applications.

Another approach involves utilization of 3D sensor devices capable ofrecognizing users' gestures or motions without dedicated handheldcontrollers or the like. Gestures are identified by processing users'images obtained by such 3D-sensors, and then they are interpreted togenerate a control command. Control commands can be used to triggerparticular actions performed by electronic equipment coupled to the3D-sensor. Such systems are now widely deployed and generally used forgaming consoles.

One of the major drawbacks of such systems is that they are not flexibleand cannot generate control commands for multiple electronic devicesconcurrently connected to a single 3D-sensor or any other device forcapturing human motions or gestures. Thus, the conventional technologyfails to provide a technique for improved detection and interpretationof human gestures associated with a particular electronic device among aplurality of devices connected to the common 3D-sensor.

SUMMARY

This summary is provided to introduce a selection of concepts in asimplified form that are further described below in the DetailedDescription. This summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

In accordance with various embodiments and the corresponding disclosurethereof, methods and systems for controlling one or more electronicdevices by recognition of gestures made by an object. The describedmethodologies enable users to interact with one or a plurality ofelectronic devices such as gaming consoles, computers, audio systems,video systems, and so forth. The interaction with various electronicdevices can be performed with the help of at least one 3D-sensor beingconfigured to recognize not only gestures, but also a particularelectronic device among the plurality of electronic devices to which thegestures are dedicated.

In accordance with one aspect, there is provided a computer-implementedmethod for controlling one or more electronic devices by recognition ofgestures made by an object. The method may comprise capturing a seriesof successive 3D images in real time and identifying the object. Theobject may have a predetermined elongated shape. The method may alsocomprise identifying that the object is oriented substantially towards apredetermined direction, determining at least one qualifying actionbeing performed by an user and/or the object, comparing the at least onequalifying action to one or more predetermined actions associated withthe direction to which the object is oriented towards, and, based on thecomparison, selectively providing to the one or more electronic devicesa command associated with the at least one qualifying action.

In some embodiments, the predetermined direction can be associated withthe one or more electronic devices. The object can be selected from agroup comprising a wand, an elongated pointing device, an arm, a hand,and one or more fingers of the user. The series of successive 3D imagescan be captured using at least one video camera or a 3D image sensor. Insome examples, the object can be identified by performing one or moreof: processing the captured series of successive 3D images to generate adepth map, determining geometrical parameters of the object, andidentifying the object by matching the geometrical parameters to apredetermined object database. The determination of at least onequalifying action may comprise the determining and acknowledging of oneor more of: a predetermined motion of the object, a predeterminedgesture of the object, a gaze of the user towards the predetermineddirection associated with one or more electronic devices, apredetermined motion of the user, a predetermined gesture of the user,biometric data of the user, and a voice command provided by the user.Biometric data of the user can be determined based on one or more of thefollowing: face recognition, voice recognition, user body recognition,and recognition of a user motion dynamics pattern. The gaze of the usercan be determined based on one or more of the following: a position ofthe eyes of the user, a position of the pupils or a contour of theirises of the eyes of the user, a position of the head of the user, anangle of inclination of the head of the user, and a rotation of the headof the user. The mentioned one or more electronic devices may comprise acomputer, a game console, a TV set, a TV adapter, a communicationdevice, a Personal Digital Assistant (PDA), a lighting device, an audiosystem, and a video system.

According to another aspect, there is provided a system for controllingone or more electronic devices by recognition of gestures made by anobject. The system may comprise at least one 3D image sensor configuredto capture a series of successive 3D images in real time and a computingunit communicatively coupled to the at least one 3D image sensor. Thecomputing unit can be configured to: identify the object; identify thatthe object is oriented substantially towards a predetermined direction;determine at least one qualifying action being performed by a userand/or the object; compare the at least one qualifying action to one ormore predetermined actions associated with the direction to which theobject is oriented towards; and, based on the comparison, selectivelyprovide to the one or more electronic devices a command associated withthe at least one qualifying action.

In some example embodiments, the at least one 3D image sensor maycomprise one or more of an infrared (IR) projector to generate modulatedlight, an IR camera to capture 3D images associated with the object orthe user, and a color video camera. The IR projector, color videocamera, and IR camera can be installed in a common housing. The colorvideo camera and/or IR camera can be equipped with liquid lenses. Thementioned predetermined direction can associated with the one or moreelectronic devices. The object can be selected from a group comprising awand, an elongated pointing device, an arm, a hand, and one or morefingers of the user. The computing unit can be configured to identifythe object by performing the acts of: processing the captured series ofsuccessive 3D images to generate a depth map, determining geometricalparameters of the object, and identifying the object by matching thegeometrical parameters to a predetermined object database. Furthermore,the computing unit can be configured to determine at least onequalifying action by performing the acts of determining andacknowledging of one or more of: a predetermined motion of the object, apredetermined gesture of the object, a gaze of the user towards thepredetermined direction associated with one or more electronic devices,a predetermined motion of the user, a predetermined gesture of the user,biometric data of the user, and a voice command provided by the user.Biometric data of the user can be determined based on one or more of thefollowing: face recognition, voice recognition, user body recognition,and recognition of user motion dynamics pattern. The gaze of the usercan be determined based on one or more of the following: a position ofthe eyes of the user, a position of the pupils or a contour of theirises of the eyes of the user, a position of the head of the user, anangle of inclination of the head of the user, and a rotation of the headof the user.

According to yet another aspect, there is provided a processor-readablemedium. The medium may store instructions, which when executed by one ormore processors, cause the one or more processors to: capture a seriesof successive 3D images in real time; identify the object; identify thatthe object is oriented substantially towards a predetermined direction;determine at least one qualifying action being performed by an userand/or the object; compare the at least one qualifying action to one ormore predetermined actions associated with the direction to which theobject is oriented towards; and, based on the comparison, selectivelyprovide to the one or more electronic devices a command associated withthe at least one qualifying action.

To the accomplishment of the foregoing and related ends, the one or moreaspects comprise the features hereinafter fully described andparticularly pointed out in the claims. The following description andthe drawings set forth in detail certain illustrative features of theone or more aspects. These features are indicative, however, of but afew of the various ways in which the principles of various aspects maybe employed, and this description is intended to include all suchaspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are illustrated by way of example, and not limitation, inthe figures of the accompanying drawings, in which like referencesindicate similar elements and in which:

FIG. 1 is a general illustration of a scene suitable for implementingmethods for controlling one or more electronic devices by recognition ofgestures made by an object.

FIG. 2 shows an example system environment suitable for implementingmethods for controlling one or more electronic devices by recognition ofgestures made by an object.

FIG. 3 shows an example embodiment of the 3D-sensor.

FIG. 4 is a diagram of the computing unit, according to an exampleembodiment.

FIG. 5 is a process flow diagram showing a method for controlling one ormore electronic devices by recognition of gestures made by the object,according to an example embodiment.

FIG. 6 is a diagrammatic representation of an example machine in theform of a computer system within which a set of instructions for themachine to perform any one or more of the methodologies discussed hereinis executed.

DETAILED DESCRIPTION

The following detailed description includes references to theaccompanying drawings, which form a part of the detailed description.The drawings show illustrations in accordance with example embodiments.These example embodiments, which are also referred to herein as“examples,” are described in enough detail to enable those skilled inthe art to practice the present subject matter. The embodiments can becombined, other embodiments can be utilized, or structural, logical andelectrical changes can be made without departing from the scope of whatis claimed. The following detailed description is, therefore, not to betaken in a limiting sense, and the scope is defined by the appendedclaims and their equivalents. In this document, the terms “a” and “an”are used, as is common in patent documents, to include one or more thanone. In this document, the term “or” is used to refer to a nonexclusive“or,” such that “A or B” includes “A but not B,” “B but not A,” and “Aand B,” unless otherwise indicated.

The techniques of the embodiments disclosed herein may be implementedusing a variety of technologies. For example, the methods describedherein may be implemented in software executing on a computer system orin hardware utilizing either a combination of microprocessors or otherspecially designed application-specific integrated circuits (ASICs),programmable logic devices, or various combinations thereof. Inparticular, the methods described herein may be implemented by a seriesof computer-executable instructions residing on a storage medium such asa carrier wave, disk drive, or computer-readable medium. Exemplary formsof carrier waves may take the form of electrical, electromagnetic, oroptical signals conveying digital data streams along a local network ora publicly accessible network such as the Internet.

The embodiments described herein relate to computer-implemented methodsand systems for controlling one or more electronic devices byrecognition of gestures made by an object. The object, as used herein,may refer to any elongated object having prolonged shape and mayinclude, for example, a wand, an elongated handheld pointing device, anarm, a hand, or one or more fingers of the user. Thus, according toinventive approaches described herein, gestures can be made by userswith either their hands (arms or fingers) or handheld, elongatedobjects. In some embodiments, gestures can be made by handheld objectsin combination with motions of arms, fingers, or other body parts of theuser.

In general, one or more 3D-sensors or video cameras can be used torecognize gestures. In the context of this document, various techniquesfor gesture identification and recognition can be used, and accordingly,various devices can be utilized. In one example embodiment, a single3D-sensor can be used and may include an IR projector, an IR camera, andan optional color video camera all embedded within a single housing.Image processing and interpretation can be performed by any computingdevice coupled to or embedding the 3D-sensor. Some examples may includea tabletop computer, laptop, tablet computer, gaming console, audiosystem, video system, phone, smart phone, PDA, or any other wired orwireless electronic device. Based on image processing andinterpretation, a particular control command can be generated andoutputted by the computing device. For example, the computing device mayrecognize a particular gesture associated with a predetermined commandand generate such command for further input into a particular electronicdevice selected from a plurality of electronic devices. For instance,one command generated by the computing device and associated with afirst gesture can be inputted to a gaming console, while anothercommand, being associated with a second gesture, can be inputted to anaudio-system. In other words, the computing device can be coupled tomultiple electronic devices of same or various types, and suchelectronic devices can be selectively controlled by the user.

In some example embodiments, the computing device may be integrated withone or more controlled electronic device(s). For instance, the computingdevice and optional 3D-sensor can be integrated with a gaming console.This gaming console can be configured to be coupled to other electronicdevices such as a lighting device, audio system, video system, TV set,and so forth. Those skilled in the art would appreciate that the3D-sensor, the computing device, and various controlled electronicdevices can be integrated with each other or interconnected numerousdifferent ways. It should also be understood that such systems mayconstrue at least some parts of “intelligent house” and may be used aspart of home automation systems.

To select a particular electronic device and generate a control commandfor such electronic device, the user should perform two actions eitherconcurrently or in series. The first action includes pointing the objecttowards the particular device. This may include posing the elongatedobject such that it is substantially oriented towards the particulardevice to be controlled. For example, the user may show the device withthe pointer finger. Alternatively, the user may orient an arm or handtowards the device. In some other examples, the user may orient thehandheld object (e.g., a wand) towards the electronic device. Ingeneral, it should be understood that any elongated object can be usedto designate a particular electronic device for further action. Suchelongated object may or may not include electronic components.

To generate a particular control command to the selected electronicdevice, the user should perform the second action, which as used hereinis referred to as a “qualifying action.” Once the interface systemidentifies that the user conducts the first and the second action, apredetermined control command is generated for a desired electronicdevice. The qualifying action may include one or more different actions.In some embodiments, the qualifying action may refer to a predeterminedmotion or gesture made by the object. For example, the user may firstpoint to an electronic device to “select” it, and then make a certaingesture (e.g., make a circle motion, a nodding motion, or any otherpredetermined motion or gesture) to trigger the generation and output ofa control command associated with the recognized gesture and “selected”electronic device. In some other embodiments, the qualifying action mayinclude a predetermined motion or gesture of the user. For example, theuser may at first point to an electronic device, and then make a certaingesture by hand or head (e.g., the user may knock a pointer finger ontoa wand held in the arm).

Furthermore, in some other embodiments, the qualifying action mayinclude a gaze of the user towards the predetermined directionassociated with one or more electronic devices. For example, the usermay point to an electronic device while also looking at it. Such acombination of actions can be unequivocally interpreted by the interfacesystem to mean that a certain command is to be performed. The qualifyingaction may refer to a voice command generated by the user. For example,the user may point to a TV set and say, “turn on,” to generate a turn oncommand. In some embodiments, the qualifying action may include receiptand identification of biometric data associated with the user. Thebiometric data may include a face, a voice, motion dynamics pattern, andso forth. For example, face recognition or voice recognition can be usedto authorize the user to control certain electronic devices.

In some additional embodiments, the interface system may require theuser to perform at least two or more qualifying actions. For example, togenerate a particular control command for an electronic device, the usershall first use the object to point out the electronic device , thenmake a predetermined gesture using the object, and then provide a voicecommand. In another example, the user may point towards the electronicdevice, make a gesture, and turn the face towards the 3D sensor forfurther face recognition and authentication. It should be understoodthat various combinations of qualifying actions can be performed andpredetermined for generation of a particular command.

The interface system may include a database of predetermined gestures,objects and related information. Once a gesture is captured by the3D-sensor, the computing device may compare the captured gesture withthe list of predetermined gestures to find the match. Based on suchcomparison, a predetermined command can be generated. Accordingly, thedatabase may store and populate a list of predetermined commands, eachof which is associated with a particular device and a particularqualifying action (or combination of qualifying actions). It should alsobe understood that locations of various electronic devices can bepre-programmed in the system, or alternatively, they can be identifiedby the 3D sensor in real time. For this purpose, the electronic devicescan be provided with tags to be attached to their surfaces. Thoseskilled in the art would appreciate that various techniques can be usedto identify electronic devices for the interface system.

Referring now to the drawings, FIG. 1 is a general illustration of scene100 suitable for implementing methods for controlling one or moreelectronic device(s) by recognition of gestures made by an object. Inparticular, FIG. 1 shows a user 102 holding a handheld elongated object104, which can be used for interaction with an interface system 106. Theinterface system 106 may include both a 3D-sensor 108 and a computingunit 110, which can be stand-alone devices or can be embedded within asingle housing.

The 3D-sensor 108 can be configured to capture a series of 3D images,which can be further transmitted to and processed by the computing unit110. As a result of the image processing, the computing unit 110 mayfirst identify the object 104 and its relative orientation in a certaindirection, and second, identify one or more “qualifying actions” asdiscussed above (e.g., identify a gesture made by the user 102 or theobject 104).

The interface system 106 may be operatively connected with variouselectronic devices 112-118. The electronic devices 112-118 may includeany device capable of receiving electronic control commands andperforming one or more certain actions upon receipt of such commands.For example, the electronic devices 112-118 may include desktopcomputers, laptops, tabletop computers, tablet computers, cellularphones, smart phones, PDAs, gaming consoles, TV sets, TV adapters,displays, audio systems, video systems, lighting devices, homeappliances, or any combination or part thereof. According to the exampleshown in FIG. 1, there is a TV set 112, an audio system 114, a gamingconsole 116, and a lighting device 118. The electronic devices 112-118are all operatively coupled to the interface system 106, as furtherdepicted in FIG. 2. In some example embodiments, the interface system106 may integrate one or more electronic devices (not shown). Forexample, the interface system 106 may be embedded in a gaming console116 or desktop computer. Those skilled in the art should understand thatvarious interconnections may be deployed for the devices 112-118.

The user 102 may interact with the interface system 106 by makinggestures or various motions with his or her hands, arms, fingers, legs,head, or other body parts; by making gestures or motions using theobject 104; or by making voice commands; by looking in a certaindirection; or any combination thereof. All of these motions, gestures,and voice commands can be predetermined so that the interface system 106is able to identify them, match them to the list of pre-stored usercommands, and generate a particular command for electronic devices112-118. In other words, the interface system 106 may be “taught” toidentify and differentiate one or more motions or gestures.

The object 104 may be any device of elongated shape and design. Oneexample of the object 104 may include a wand or elongated pointingdevice. It is important to note that the object 104 may be free of anyelectronics. It could be any article of prolonged shape. Although it isnot described in this document (so as not to take away from the generalprinciples), the interface system 106 may be trained to identify anddifferentiate the object 104 as used by the user 102. Theelectronics-free object 104 may have a different design and may imitatevarious sporting equipment (e.g., a baseball bat, racket, machete,sword, steering wheel, and so forth). In some embodiments, the object104 may have a specific color design or color tags. Such color tags orcolored areas may have various designs and shapes, and in general, theymay help facilitate better identification of the object 104 by theinterface system 106.

FIG. 2 shows an example system environment 200 suitable for implementingmethods for controlling one or more electronic device(s) by recognitionof gestures made by an object. The system environment 200 comprises theinterface system 106, one or more electronic devices 210, and a network220.

The interface system 106 may include at least one 3D-sensor 108, thecomputing unit 110, a communication unit 230, and an optional input unit240. All of these units 108, 110, 230, and 240 can be operativelyinterconnected. The 3D-sensor 108 may be implemented differently and mayinclude an image capture device. Further details about the 3D-sensor 108are documented below, with reference to FIG. 3. It should also beappreciated that the interface system 106 may include two or more3D-sensors 108 spaced apart from each other.

The aforementioned one or more electronic device(s) 210 are, in general,any device configured to trigger one or more predefined action(s) uponreceipt of a certain control command. Some examples of electronicdevices 210 include, but are not limited to computers, displays, audiosystems, video systems, gaming consoles, and lighting devices. In oneembodiment, the system environment 200 may comprise multiple electronicdevices 210 of different types, while in another embodiment, themultiple electronic devices 210 may be of the same type (e.g., two ormore interconnected gaming consoles are used).

The communication unit 230 may be configured to transfer data betweenthe interface system 106 and one or more electronic device(s) 210. Thecommunication unit 230 may include any wireless or wired networkinterface controller, including, for example, a Local Area Network (LAN)adapter, Wide Area Network (WAN) adapter, Wireless Transmit ReceivingUnit (WTRU), WiFi adapter, Bluetooth adapter, GSM/CDMA adapter, and soforth.

The input unit 240 may be configured to enable users to input data ofany nature. In one example, the input unit 240 may include a keyboard orad hoc buttons allowing the users to input commands, program aninterface, customize settings, and so forth. According to anotherexample, the input unit 240 includes a microphone to capture user voicecommands, which can then be processed by the computing unit 110. Variousdifferent input technologies can be used in the input unit 240,including touch screen technologies, pointing devices, and so forth.

The network 220 may couple the interface system 106 and one or moreelectronic device(s) 210. The network 220 is a network of dataprocessing nodes interconnected for the purpose of data communicationand may be utilized to communicatively couple various components of theenvironment 200. The network 220 may include the Internet or any othernetwork capable of communicating data between devices. Suitable networksmay include or interface with any one or more of the following: localintranet, PAN (Personal Area Network), LAN, WAN, MAN (Metropolitan AreaNetwork), virtual private network (VPN), storage area network (SAN),frame relay connection, Advanced Intelligent Network (AIN) connection,synchronous optical network (SONET) connection, digital T1, T3, E1 or E3line, Digital Data Service (DDS) connection, DSL (Digital SubscriberLine) connection, Ethernet connection, ISDN (Integrated Services DigitalNetwork) line, dial-up port such as a V.90, V.34 or V.34b is analogmodem connection, cable modem, ATM (Asynchronous Transfer Mode)connection, or an FDDI (Fiber Distributed Data Interface), or CDDI(Copper Distributed Data Interface) connection. Furthermore,communications may also include links to any of a variety of wirelessnetworks including, WAP (Wireless Application Protocol), GPRS (GeneralPacket Radio Service), GSM, CDMA or TDMA (Time Division MultipleAccess), cellular phone networks, GPS, CDPD (cellular digital packetdata), RIM (Research in Motion, Limited) duplex paging network,Bluetooth radio, or an IEEE 802.11-based radio frequency network. Thenetwork 220 can further include or interface with any one or more of thefollowing: RS-232 serial connection, IEEE-1394 (Firewire) connection,Fiber Channel connection, IrDA (infrared) port, SCSI (Small ComputerSystems Interface) connection, USB (Universal Serial Bus) connection, orother wired or wireless, digital or analog interface or connection, meshor Digi® networking.

FIG. 3 shows an example embodiment of the 3D-sensor 108. In someembodiments, the 3D-sensor 108 may comprise at least a color videocamera 310 configured to capture images. In some other embodiments, the3D-sensor 108 may include an IR projector 320 to generate modulatedlight and also an IR camera 330 to capture 3D images associated with theobject 104 or the user 102. In yet more exemplary embodiments, the3D-sensor 108 may comprise the color video camera 310, IR projector 320,and IR camera 330. In an example, the color video camera 310, IRprojector 320, and IR camera 330 are all encased within a singlehousing.

Furthermore, in some embodiments, the 3D-sensor 108 may also comprise acomputing module 340 for image analysis, pre-processing, processing, orgeneration of commands for the color video camera 310, IR projector 320,or IR camera 330. In some other examples, such operations can be done bythe computing unit 110. The 3D-sensor 108 may also include a bus 350interconnecting the color video camera 310, IR projector 320, and/or IRcamera 330, depending on which devices are used.

The 3D-sensor 108 may also include one or more liquid lenses 360, whichcan be used for the color video camera 310, IR camera 330, or both. Ingeneral, liquid lenses 360 can be used to adaptively focus cameras ontoa certain object or objects. The liquid lens 360 may use one or morefluids to create an infinitely-variable lens without any moving parts,by controlling the meniscus (the surface of the liquid.) The control ofthe liquid lens 360 may be performed by the computing module 340 or thecomputing unit 110.

Additional details of the 3D-sensor 108 and how captured image data canbe processed are disclosed in the Russian patent application serialnumber 2011127116, which is incorporated herein by reference in itsentirety.

FIG. 4 is a diagram of the computing unit 110, according to an exampleembodiment. As shown in the figure, the computing unit 110 may comprisean identification module 410, orientation module 420, qualifying actionmodule 430, comparing module 440, command generator 450, and database460. In other embodiments, the computing unit 110 may includeadditional, fewer, or different modules for various applications.Furthermore, all modules can be integrated within a single system, oralternatively, can be remotely located and optionally accessed via athird party.

The identification module 410 can be configured to identify the object104 and/or the user 102. The identification process may includeprocessing the series of successive 3D images as captured by the3D-sensor 108. A depth map is generated as the result of suchprocessing. Further processing of the depth map enables thedetermination of geometrical parameters of the object 104 or the user102. For example, a virtual hull or skeleton can be created. Oncegeometrical parameters are defined, the object 104 can be identified bymatching these geometrical parameters to predetermined objects as storedin the database 460.

The orientation module 420 can be configured to identify that the object104 is oriented substantially towards a predetermined direction. Morespecifically, the orientation module 420 can track movements of theobject 104 so as to identify that the object 104 is oriented towards acertain direction for a predetermined period of time. Such certaindirections can be associated with the electronic devices 210 or theinterface system 106. It should be understood that the position of theelectronic devices 210 can be preliminarily stored in the database 460,or the interface system 106 can be trained to identify and storelocations associated with the electronic devices 210. In someembodiments, the interface system 106 can obtain and store images ofvarious electronic devices 210 such that in future, they can be easilyidentified. In some further embodiments, the electronic devices 210 canbe provided with tags (e.g., color tags, RFID tags, bar code tags, andso forth). Once they are identified, the interface system 106 canassociate the tags with certain locations in a 3D space. Those skilledin the art would appreciate that various approaches can be used toidentify the electronic devices 210 and their associated locations, sothat the orientation of the object 104 towards such locations can beeasily identified by the interface system 106.

The qualifying action module 430 can be configured to track motions ofthe user 102 or the object 104, and determine at least one qualifyingaction being performed by the user 102 or the object 104. As mentioned,the qualifying action may include one or more different actions. In someembodiments, the qualifying action may refer to a predetermined motionor gesture made by the object 104. For example, a nodding motion or acircle motion can be considered as a qualifying action. In some otherembodiments, the qualifying action may include a predetermined motion orgesture of the user 102. For example, the user 102 may perform a gestureby the hand or head. There are no restrictions to such gestures. Theonly requirement is that the interface system 106 should be able todifferentiate and identify them. Accordingly, it should be understoodthat the interface system 106 can store previously stored referencemotions or reference gestures in the database 460. In some embodiments,the interface system 106 can be trained by performing various motionsand gestures, such that the sample motions and gestures can be stored inthe database 460 for further comparison with gestures captured in realtime.

In some other embodiments, the qualifying action may include a gaze ofthe user 102 towards the predetermined direction associated with one ormore electronic devices 210. The gaze of the user 102 can be determinedbased on one or more of the following: position of the eyes of the user102, position of the pupils or a contour of the irises of the eyes ofthe user 102, position of the head of the user 102, angle of inclinationof the head of the user 102, and a rotation of the head of the user 102.

In some additional embodiments, the qualifying action may include avoice command generated by the user 102. Voice commands can be capturedby the input unit 240 and processed by the computing unit 110 in orderto recognize the command and compare it to a predetermined list of voicecommands to find a match.

In some additional embodiments, the qualifying action may includereceipt and identification of biometric data associated with the user102. The biometric data may include a face, voice, motion dynamicspatterns, and so forth. Based on the captured biometrics data, thecomputing unit 110 may authenticate a particular user 102 to control oneor another electronic device 210. Such a feature may prevent, forexample, children from operating dangerous or unwanted electronicdevices 210.

In some embodiments, the interface system 106 may require that the user102 perform at least two or more qualifying actions. For example, togenerate a particular control command for an electronic device 210, theuser 102 shall first point to a certain electronic device 210 using theobject 104, then make a predetermined gesture using the object 104, andthirdly, provide a voice command or perform another gesture. Given thatvarious qualifying actions are provided, it should be understood thatmultiple combinations can be used to generate different controlcommands.

The comparing module 440 can be configured to compare the capturedqualifying action to one or more predetermined actions being associatedwith the direction to which the object 104 was oriented (as defined bythe orientation module 420). The aforementioned one or morepredetermined actions can be stored in the database 460.

The command generator 450 can be configured to selectively provide toone or more electronic devices 210, based on the comparison performed bythe comparing module 440, a command associated with at least onequalifying action that is identified by the qualifying action module430. Accordingly, each control command, among a plurality of controlcommands stored in the database 460, is predetermined for a certainelectronic device 210 and is associated with one or more certaingestures or motions and the location of the electronic device 210.

The database 460 can be configured to store predetermined gestures,motions, qualifying actions, voice commands, control commands,electronic device location data, visual hulls or representations ofusers and objects, and so forth.

FIG. 5 is a process flow diagram showing a method 500 for controllingone or more electronic devices 210 by recognition of gestures made bythe object 104, according to an example embodiment. The method 500 maybe performed by processing logic that may comprise hardware (e.g.,dedicated logic, programmable logic, and microcode), software (such assoftware run on a general-purpose computer system or a dedicatedmachine), or a combination of both. In one example embodiment, theprocessing logic resides at the interface system 106.

The method 500 can be performed by the various modules discussed abovewith reference to FIGS. 2 and 4. Each of these modules can compriseprocessing logic. It will be appreciated by one of ordinary skill in theart that examples of the foregoing modules may be virtual, andinstructions said to be executed by a module may, in fact, be retrievedand executed by a processor. The foregoing modules may also includememory cards, servers, and/or computer discs. Although various modulesmay be configured to perform some or all of the various steps describedherein, fewer or more modules may be provided and still fall within thescope of example embodiments.

As shown in FIG. 5, the method 500 may commence at operation 510, withthe one or more 3D-sensors 108 capturing a series of successive 3Dimages in real time. At operation 520, the identification module 410identifies the object 104.

At operation 530, the orientation module 420 identifies the orientationof the object 104 and that the object 104 is oriented substantiallytowards a predetermined direction associated with a particularelectronic device 210. Accordingly, the orientation module 420 may trackmotion of the object 104 in real time.

At operation 540, the qualifying action module 430 determines that atleast one qualifying action is performed by the user 102 and/or theobject 104. For this purpose, the qualifying action module 430 may trackmotions of the user 102 and/or the object 104 in real time.

At operation 550, the comparing module 440 compares the qualifyingaction, as identified at operation 540, to one or more predeterminedactions associated with the direction that was identified at operation530 as the direction towards which the object is sustainably oriented.

At operation 560, the command generator 450 selectively provides, to theone or more electronic devices 210, a control command associated with atleast one qualifying action and based on the comparison performed atoperation 550.

FIG. 6 shows a diagrammatic representation of a computing device for amachine in the example electronic form of a computer system 600, withinwhich a set of instructions for causing the machine to perform any oneor more of the methodologies discussed herein can be executed. Inexample embodiments, the machine operates as a standalone device or canbe connected (e.g., networked) to other machines. In a networkeddeployment, the machine can operate in the capacity of a server, aclient machine in a server-client network environment, or as a peermachine in a peer-to-peer (or distributed) network environment. Themachine can be a personal computer (PC), tablet PC, set-top box (STB),PDA, cellular telephone, portable music player (e.g., a portable harddrive audio device, such as a Moving Picture Experts Group Audio Layer 3(MP3) player), web appliance, network router, switch, bridge, or anymachine capable of executing a set of instructions (sequential orotherwise) that specify actions to be taken by that machine. Further,while only a single machine is illustrated, the term “machine” shallalso be taken to include any collection of machines that use or jointlyexecute a set (or multiple sets) of instructions to perform any one ormore of the methodologies discussed herein.

The example computer system 600 includes a processor or multipleprocessors 602 (e.g., a central processing unit (CPU), graphicsprocessing unit (GPU), or both), main memory 604 and static memory 606,which communicate with each other via a bus 608. The computer system 600can further include a video display unit 610 (e.g., a liquid crystaldisplay (LCD) or cathode ray tube (CRT)). The computer system 700 alsoincludes at least one input device 612, such as an alphanumeric inputdevice (e.g., a keyboard), cursor control device (e.g., a mouse),microphone, digital camera, video camera, and so forth. The computersystem 600 also includes a disk drive unit 614, signal generation device616 (e.g., a speaker), and network interface device 618.

The disk drive unit 614 includes a computer-readable medium 620, whichstores one or more sets of instructions and data structures (e.g.,instructions 622) embodying or utilized by any one or more of themethodologies or functions described herein. The instructions 622 canalso reside, completely or at least partially, within the main memory604 and/or within the processors 602 during execution by the computersystem 600. The main memory 604 and the processors 602 also constitutemachine-readable media.

The instructions 622 can further be transmitted or received over thenetwork 220 via the network interface device 618 utilizing any one of anumber of well-known transfer protocols (e.g., Hyper Text TransferProtocol (HTTP), CAN, Serial, and Modbus).

While the computer-readable medium 620 is shown in an example embodimentto be a single medium, the term “computer-readable medium” should betaken to include a single medium or multiple media (e.g., a centralizedor distributed database, and/or associated caches and servers) thatstore the one or more sets of instructions. The term “computer-readablemedium” shall also be taken to include any medium that is capable ofstoring, encoding, or carrying a set of instructions for execution bythe machine, and that causes the machine to perform any one or more ofthe methodologies of the present application, or that is capable ofstoring, encoding, or carrying data structures utilized by or associatedwith such a set of instructions. The term “computer-readable medium”shall accordingly be taken to include, but not be limited to,solid-state memories, optical and magnetic media. Such media can alsoinclude, without limitation, hard disks, floppy disks, flash memorycards, digital video disks, random access memory (RAM), read only memory(ROM), and the like.

The example embodiments described herein can be implemented in anoperating environment comprising computer-executable instructions (e.g.,software) installed on a computer, in hardware, or in a combination ofsoftware and hardware. The computer-executable instructions can bewritten in a computer programming language or can be embodied infirmware logic. If written in a programming language conforming to arecognized standard, such instructions can be executed on a variety ofhardware platforms and for interfaces to a variety of operating systems.Although not limited thereto, computer software programs forimplementing the present method can be written in any number of suitableprogramming languages such as, for example, C, C++, C#, Cobol, Eiffel,Haskell, Visual Basic, Java, JavaScript, Python, or other compilers,assemblers, interpreters or other computer languages or platforms.

Thus, methods and systems for controlling one or more electronicdevice(s) by recognition of gestures made by an object have beendescribed. The disclosed technique provides a useful tool to enablepeople to interact with various electronic devices based on gestures,motions, voice commands, and gaze information.

Although embodiments have been described with reference to specificexample embodiments, it will be evident that various modifications andchanges can be made to these example embodiments without departing fromthe broader spirit and scope of the present application. Accordingly,the specification and drawings are to be regarded in an illustrativerather than a restrictive sense.

1. A computer-implemented method for controlling one or more electronic devices by recognition of gestures made by a three-dimensional object, the method comprising: capturing a series of successive 3D images in real time; identifying that the object has a predetermined elongated shape; identifying that the object is oriented substantially towards a predetermined direction; determining at least one qualifying action being performed by a user and/or the object; comparing the at least one qualifying action to one or more pre-determined actions associated with the direction towards which the object is oriented; and based on the comparison, selectively issuing to the one or more electronic devices a command associated with the at least one qualifying action.
 2. The method of claim 1, wherein the predetermined direction is associated with the one or more electronic devices.
 3. The method of claim 1, wherein the object is selected from a group comprising a wand, a elongated pointing device, an arm, a hand, and one or more fingers of the user.
 4. The method of claim 1, wherein the series of successive 3D images are captured using at least one video camera or a 3D image sensor.
 5. The method of claim 1, wherein identifying the object comprises one or more of: processing the captured series of successive 3D images to generate a depth map; determining geometrical parameters of the object; and identifying the object by matching the geometrical parameters to a predetermined object database.
 6. The method of claim 1, wherein determining at least one qualifying action comprises determining and acknowledging one or more of: a predetermined motion of the object; a predetermined gesture of the object; a gaze of the user towards the predetermined direction associated with one or more electronic devices; a predetermined motion of the user; a predetermined gesture of the user; biometric data of the user; and a voice command provided by the user.
 7. The method of claim 6, wherein biometric data of the user is determined based on one or more of the following: face recognition, voice recognition, user body recognition, and recognition of a user motion dynamics pattern.
 8. The method of claim 6, wherein the gaze of the user is determined based on one or more of the following: a position of eyes of the user, a position of pupils or a contour of irises of the eyes, a position of a head of the user, an angle of inclination of the head, and a rotation of the head.
 9. The method of claim 1, wherein the one or more electronic devices comprising a computer, a game console, a TV set, a TV adapter, a communication device, a Personal Digital Assistant (PDA), a lighting device, an audio system, and a video system.
 10. A system for controlling one or more electronic devices by optical recognition of gestures made by an object, the system comprising: at least one three-dimensional image sensor configured to capture a series of successive 3D images in real time; and a computing unit communicatively coupled to the at least one 3D image sensor, the computing unit being configured to: identify that the object has a predetermined elongated shape; identify that the object is oriented substantially towards a predetermined direction; determine at least one qualifying action being performed by a user and/or the object; compare the at least one qualifying action to one or more predetermined actions associated with the direction towards which the object is oriented; and based on the comparison, selectively issue to the one or more electronic devices a command associated with the at least one qualifying action.
 11. The system of claim 10, wherein the at least one 3D image sensor comprises one or more of an infrared (IR) projector to generate modulated light, an IR camera to capture 3D images associated with the object or the user, and a color video camera.
 12. The system of claim 11, wherein the IR projector, the color video camera, and IR camera are installed in a common housing.
 13. The system of claim 11, wherein the color video camera and/or the IR camera are equipped with liquid lenses.
 14. The system of claim 10, wherein the predetermined direction is associated with the one or more electronic devices.
 15. The system of claim 10, wherein the object is selected from a group comprising a wand, a elongated pointing device, an arm, a hand, and one or more fingers of the user.
 16. The system of claim 10, wherein the computing unit is configured to identify the object by performing the acts of: processing the captured series of successive 3D images to generate a depth map; determining geometrical parameters of the object; and identifying the object by matching the geometrical parameters to a predetermined object database.
 17. The system of claim 10, wherein the computing unit is configured to determine at least one qualifying action by performing the acts of determining and acknowledging one or more of: a predetermined motion of the object; a predetermined gesture of the object; a gaze of the user towards the predetermined direction associated with one or more electronic devices; a predetermined motion of the user; a predetermined gesture of the user; biometric data of the user; and a voice command provided by the user.
 18. The system of claim 17, wherein biometric data of the user is determined based on one or more of the following: face recognition, voice recognition, user body recognition, and recognition of a user motion dynamics pattern.
 19. The system of claim 17, wherein the gaze of the user is determined based on the one or more of the following: a position of eyes of the user, a position of pupils or a contour of irises of the eyes, a position of a head of the user, an angle of inclination of the head, and a rotation of the head.
 20. A processor-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to: capture a series of successive 3D images in real time; identify that an object has a predetermined elongated shape; identify that the object is oriented substantially towards a predetermined direction; determine at least one qualifying action being performed by a user and/or the object; compare the at least one qualifying action to one or more predetermined actions associated with the direction towards which the object is oriented; and based on the comparison, selectively provide to the one or more electronic devices a command associated with the at least one qualifying action. 